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What We Will Be Covering 





An overview of the hardware 
A basic rendering pipeline 
How to improve performance 
Under used capacities 

PS2 design techniques 
Questions... 


What We Will Not Be Covering 





A MIPS programming course 

Showing any sample code 

The price of beer (| am so glad it is cheap!) 
A PS2 in chocolate (ummm...tasty!) 





Basic PS2 Architecture 








IOP: Input Output Processor 
SPU2: Sound Processor 
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SPU2 |IOP 




















Memory 
32mb 











EE: 128-bit Emotion Engine GS: Graphic Synthesiser 
VUO/VU1: Vector Units DMA: Direct memory access 
FPU: Floating Point Unit IPU: Image processing Unit 


Caches And Scratchpad 


























Emotion Engine 
128bit bus 














GS 
4mb 


















+ Similar to old style PC L1 cache. 

+ PS2 has small caches, as it was felt 
that a lot of dynamic data would not 
be in the cache for any length of 
time. 





Programmers 





EE Vector Units 





Graphic Synthesiser 

















Emotion Engine 











128bit bus 
GS 


4mb 


























Emotion Engine 
128bit bus 































cache 
FPUMEE CORE 












































* Each vector unit can do 4 multiplies and 4 adds in a single 
instruction and can transform about 36million vertices/sec. 


+ Both can operate in Micromode — LIW architecture (32bits*2) 


+ Argued that due to the PS2 architecture the PC paradigm 
started to shift with the emergence of Vertex Shaders. 


Primitives per second: 
150million points 
50million textured sprites 


75million untextured triangles 
37.5million textured triangles 


Features: 

Alpha blend, Z-test, Bi- 
linear/tri-linear filtering. 
Efficient scissoring and a 
fill rate of 2.4-giga pixel. 





GIF Connection For VU1 Fill Rate 




















Emotion Engine 








Emotion Engine 
128bit bus 


















IPU 















128bit bus 






















































































Memory GS Memory 
32mb ache 4mb 32mb 
FPUPIEE CORE} vuo FPUPIEE CORE 
+ Vector Unit 1 has a dedicated output path to the GIF * Bandwidth of 4MB Embedded DRAM 48GB/sec 
+ It also has a much larger internal memory than VUO to — Bandwidth of frame buffer 38.4Gb/sec 
support double buffering of input and output data. — Texture bandwidth 9.6Gb/sec 


* This enables fast transformation and output to GS of + Fill rate 1.2Giga pixel a sec for texture 
* Fill rate 2.4Giga pixel a sec for untextured 


patterned data. 
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IOP, SPU And 


DMA 

















Emotion Engine 

















































































































128bit bus 
Memory GS 
32mb 4mb 
Emotion Engine 
128bit bus 
GS F 
es * DMA bus has a bandwidth of 2.4Gb/sec, faster than 
AGPx8 which is (in theory!) 2.1Gb/sec. 
* The DMA bus controls all data transfers in the system. 
* The DMAC will not stall the CPU when transferring data. 
The IOP processor comes from PS1, this solves compatibility! * DMA transfers must be aligned to 128bits. 





ment Europe _AGDC 


DMA Data Transfer DMA Chains 





Time sliced: 
8qword to 1 
8qword to 2 

Device4 8qword to 3 


we repeat 
Dedicated channel 
wi a for each device 
| Device3 
Devicel 
P NOTE 
Device2 DMA bypasses the cache 


To send data through a channel you just specify the start 
address, the data size and a start signal to the DMAC. Built from list of tags, can contain many data types 


Main memory 









Deviced 








Basic Rendering Pipeline How To Improve PS2 Performance 














[ = | GRUP Copipesssorv08 * By not treating the PS2 as a PC 
* By using texture sizes and formats 
ES List processing DMA * Prevent the thrashing of Texture 
Cache 
e Without abusing Instruction and 
vu1 Data Cache 
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1st Attempt At A PC Port 
(max 0.5 million polys) 


ee ie 












Memory 
Geometry 
and 
texture 





Transformation 
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2nd Attempt At A PC Port 
(max 1.5 million polys) 


IOP |SPU | IPU 


DMA bu 








Memory 












2 4Gb/se Geometry 


and 
texture 


Transformation 
in parallel with CPU 
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VU Renderer (lighting, no animation) Complete Game (lighting, animation) 


(typical 10-20 million polys) (typical 5-10 million polys) 
fior[seu] [iro hors) [ru 











Memory Memory 



















DMA bu 
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2.4Gb, DMA bus: 2.4Gb/se 


Texture Texture 


Transformation Transformation 
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VRAM Layout 





* 4MB Embedded memory 
* 4MB of VRAM is split into 8K pages 
— Pages split into 32 blocks of 256 bytes 
+ Frame buffers addressed by page 
* Textures addressed by block 
— Allowing multiple textures per page 


By Using Texture Size And Format 





* 4MB of VRAM is s 
split into 8K pages * EG 16-Bit Texture Page 


— Pages split into 32 
blocks of 256 bytes 
* Block position 
varies based on 
format 
Possible to store 


multiple textures in 
1 page 








GS Coordinate System GS Coordinate Scissoring 








0,0 
* X and Y Values are 16bit 


* Frame Buffers use a 16-bit coordinate system ae j 
— Scissoring will not work 








— 12-bit integer . 4-bit fraction outside that range 
— Full Range 0 - 4095.9375 * No hardware clipping 

* Typically centre specified as (2048, 2048) — There is a VU clip 

* Scissoring area specified based relative to this nsimugtion 4096.4096 
centre 
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Prevent The Thrashing Of Texture Instruction And Data Cache Issues 
—_Cache — 





* Cache Issues 
— Large Loops and Jumps 
— Large Objects/Structures 
— Consider the cost of useful C++ features (e.g. 


e Current texels read from Texture Cache 
— Only 8K in size or 1 Texture Page 
— Costs to reload Texture Cache 


* No need to use PC-style 32-bit textures Templates) they can have a negative effect 
— Too many colours, takes up too much VRAM + What can help? 
— Aiming for TV not a PC Monitor — Breaking large loops into several smaller loops 
* Texture Sizes that fit into Texture Cache = Ghee disasserublyehecdedar minna 
— Abit 128x128, 8bit 128x64 (with CLUT) - Un-cached Memory Access (0x20000000) 
— 16bit 64x64, 32bit 64x32 — Scratchpad is the fastest memory you have direct 


access to, use as a main work area. 





VIF Data 


Vector alt 0 Usage Compression/Decompression 






































































































Emotion Engine Emotion Engine 
128bit bus 128bit bus 

GS Memory GS 

4mb 32mb 4mb 

* Suggested for taking some work off the CPU and help 8-bit * Compressed formats reduce 

reduce I$ misses. Zz) memory size of model. 

+ Its not recommended to use VUO in Macromode. = > e Doponprostia hom paced 
Tre a formats by the VIF, provides 





+ Use Micromode and allow the CPU to carry on in parallel. 32-bit reduction load on VU. 
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Texture And Geometry Streaming 








Emotion Engine 








128bit bus 





cache 
EE CORE 






































* 1.2Gb/sec max bandwidth (24-meg/frame). 


e GIF arbitrates between paths and packs data 
in to 64bit for GS. 


* Watch priority —_ with paths to the GIF. 


PS2 to PC Programme 


Summary 





The key to PS2 power is keeping the units busy 


Keeping data moving in parallel is the key to 
keeping the processors fed with data. 


DMA is the system which does this. This is the 
most crucial thing to understand to get 
performance on PS2. 


VRAM seems small but there are plenty of tricks. 
Cache issues... remember Scratchpad! 
Vector Unit 0 is underused. 





